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Abstract 

We study a class of hypothesis testing problems in which, upon observing the realization of an n- 
dimensional Gaussian vector, one has to decide whether the vector was drawn from a standard normal 
distribution or, alternatively, whether there is a subset of the components belonging to a certain given 
class of sets whose elements have been "contaminated," that is, have a mean different from zero. We 
establish some general conditions under which testing is possible and others under which testing is 
hopeless with a small risk. The combinatorial and geometric structure of the class of sets is shown to 
play a crucial role. The bounds are illustrated on various examples. 



1 Introduction 

In this paper we study the following hypothesis testing problem introduced by Arias-Castro, Candes, Hel- 
gason and Zeitouni ,'! . One observes an n-dimensional vector X = (A*i, . . . , X n ). The null hypothesis Hq 
is that the components of X are independent and identically distributed (i.i.d.) standard normal random 
variables. We denote the probability measure and expectation under H by Po and E , respectively. 

To describe the alternative hypothesis Hi, consider a class C = {Si, . . . , Sn} of N sets of indices such 
that Sk C {1, . . . , n} for all k = 1, . . . , N. Under Hi, there exists an S E C such that 



Xi has distribution 



Af(0, 1) if ii S 
Af Qi, 1) if i G S 



where /j, > is a positive parameter. The components of X are independent under Hi as well. The 
probability measure of X defined this way by an S G C is denoted by P5. Similarly, we write Eg for the 
expectation with respect to P5. Throughout we will assume that every S € C has the same cardinality 
\S\ = K. 

A test is a binary-valued function / : 1" — > {0, 1}. If f(X) = then we say that the test accepts the 
null hypothesis, otherwise Ho is rejected. One would like to design tests such that Ho is accepted with a 
large probability when X is distributed according to Po and it is rejected when the distribution of X is Ps 
for some S e C. Following Arias-Castro, Candes, Helgason, and Zeitouni [3], we consider the risk of a test 
/ measured by 

R(f) = P {f(X) = 1} + 1 J2 P s{.f(X) = 0}. (1) 
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This measure of risk corresponds to the view that, under the alternative hypothesis, a set S C C is selected 
uniformly at random and the components of X belonging to S have mean fi. In the sequel, we refer to the 
first and second terms on the right-hand side of ([T| as the type I and type II errors, respectively. 

We are interested in determining, or at least estimating the value of \i under which the risk can be made 
small. Our aim is to understand the order of magnitude, when n is large, as a function of n, K, and the 
structure of C, of the value of the smallest /i for which risk can be made small. The value of [i for which the 
risk of the best possible test equals 1/2 is called critical. 

Typically, the n components of X represent weights over the n edges of a given graph G and each S S C 
is a subgraph of G. When X, ~ Af(n, 1) then the edge i is "contaminated" and we wish to test whether 
there is a subgraph in C that is entirely contaminated. 

In [3J two examples were studied in detail. In one case C contains all paths between two given vertices in 
a two-dimensional grid and in the other C is the set of paths from root to a leaf in a complete binary tree. 
In both cases the order of magnitude of the critical value of \x was determined. Arias-Castro, Candes, and 
Durand [4] investigate another class of examples in which elements of C correspond to clusters in a regular 
grid. Both [3J and [3] describe numerous practical applications of problems of this type. 

Some other interesting examples are when C is 

# the set of all subsets S C {1 , . . . , n} of size K; 

# the set of all cliques of a given size in a complete graph; 

# the set of all bicliques (i.e., complete bipartite subgraphs) of a given size in a complete bipartite graph; 

# the set of all spanning trees of a complete graph; 

^ the set of all perfect matchings in a complete bipartite graph; 

# the set of all sub-cubes of a given size of a binary hypercube. 

The first of these examples, which lacks any combinatorial structure, has been studied in the rich literature 
on multiple testing, see, for example, Ingster [2D], Baraud [5], Donoho and Jin |T2j and the references therein. 

As pointed out in [3J, regardless of what C is, one may determine explicitly the test /* minimizing the 
risk. It follows from basic results of binary classification that for a given vector x = (xi, . . . , x n ), f*(x) = 1 
if and only if the ratio of the likelihoods of x under (l/N) X^see and Po exceeds 1. Writing 

4> {x) = (27r)-"/ 2 e- *?/ 2 and <j> s {x) = (2ir)- n ^ e - E ieS (^-^) 2 /2-E igs Al* 

for the probability densities of Po and Fg, respectively, the likelihood ratio at x is 

where xs — SieS Xi ~ Thus, the optimal test is given by 



if 1 < 1 



r(x) = t {L{x)>1} =\ n^ c 
1 otherwise. 

The risk of /* (often called the Bayes risk) may then be written as 



R* — R* c 



( M ) = R(T) = i - b^o\L(x) - 1| = i - \ f M*) -ill <t>s(*) 

J sec 



dx . 



We are interested in the behavior of R* as a function of C and /i. Clearly, R* is a monotone decreasing 
function of /i. For \i sufficiently large, R* is close to zero while for very small values of /i, R* is near its 



2 



maximum value 1, indicating that testing is virtually impossible. Our aim is to understand for what values 
of \x the transition occurs. This depends on the combinatorial and geometric structure of the class C. We 
describe various general conditions in both directions and illustrate them on examples. 

Remark. (an alternative risk measure.) Arias-Castro, Candes, Helgason, and Zeitouni [3] also 
consider the risk measure 

R(f) = P {/(X) = 1} + maxP s {/(X) = 0} . 

Clearly, R(f) > R{f) and when there is sufficient symmetry in / and C, we have equality. However, there 
are significant differences between the two measures of risk. The alternative measure R obviously satisfies 
the following monotonicity property: for a class C and parameter // > 0, let -Rc(m) denote the smallest 
achievable risk. If A C C are two classes then for any /u, R^(n) < R c {y). In contrast to this, the "Bayesian" 
risk measure R(f) does not satisfy such a monotonicity property as is shown in Section [5] In this paper we 
focus on the risk measure R(f). 

Plan of the paper. The paper is organized as follows. In Section [2] we briefly discuss two suboptimal but 
simple and general testing rules (the maximum test and the averaging test) that imply sufficient conditions 
for testability that turn out to be useful in many examples. 

In Section [3] a few general sufficient conditions are derived for the impossibility of testing under symmetry 
assumptions for the class. 

In Section [4] we work out several concrete examples, including the class of all if -sets, the class of all 
cliques of a certain size in a complete graph, the class of all perfect matchings in the complete bipartite 
graph, and the class of all spanning trees in a complete graph. 

In Section [5] we show that, perhaps surprisingly, the optimal risk is not monotone in the sense that 
larger classes may be significantly easier to test than small ones, though monotonicity holds under certain 
symmetry conditions. 

In the last two sections of the paper we use techniques developed in the theory of Gaussian processes to 
establish upper and lower bounds related to geometrical properties of the class C. In Section [6] general lower 
bounds are derived in terms of random subclasses and metric entropies of the class C. Finally, in Section 
[7] we take a closer look at the type I error of the optimal test and prove an upper bound that, in certain 
situations, is significantly tighter than the natural bound obtained for a general-purpose maximum test. 

2 Simple tests and upper bounds 

As mentioned in the introduction, the test /* minimizing the risk is explicitly determined. However, the 
performance of this test is not always easy to analyze. Moreover, efficient computation of the optimal test is 
often a non-trivial problem though efficient algorithms are available in many interesting cases. (We discuss 
computational issues for the examples of Section [1]) Because of these reasons it is often useful to consider 
simpler, though suboptimal, tests. In this section we briefly discuss two simplistic tests, a test based on 
averaging and a test based on maxima. These are often easier to analyze and help understand the behavior 
of the optimal test as well. In many cases one of these tests turn out to have a near-optimal performance. 

A simple test based on averaging 

Perhaps the simplest possible test is based on the fact that the sum of the components of X is zero-mean 
normal under P and has mean /iK under the alternative hypothesis. Thus, it is natural to consider the 
averaging test 

f( x ) = 1 {T,? =1 X,>^K/2} ■ 

Proposition 1 Let 5 > 0. The risk of the averaging test f satisfies R(f) < S whenever 

^ViP log r 
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Proof: Observe that under Pq, the statistic X)"=i nas normal Af(0,n) distribution while for each S E C, 
under P s , it is distributed as Af(nK,n). Thus, R(f) < 2 e -(^) 2 /(8«). □ 



A test based on maxima 

Another natural test is based on the fact that under the alternative hypothesis for some S E C, X$ = YlieS 
is normal (^iK,K). Consider the maximum test 

t! s . t , , . t v . [iK + E max S £cX s 
j(x) = 1 if and only if max As > . 

The test statistic maxs^c Xs is often referred to as a scan statistic and has been thoroughly studied for a 
wide range of applications, see Glaz, Naus, and Wallenstein [IB]. Here we only need the following simple 
observation. 

Proposition 2 The risk of the maximum test f satisfies R(f) < S whenever 



E max SeC X s 2 2 
K +2 VA l0g r 

In the analysis it is convenient to use the following simple Gaussian concentration inequality, see Tsirelson, 
Ibragimov, and Sudakov [2"5] , 

Lemma 3 (tsirelson's inequality.) Let X = (Xi, . . . ,X n ) be an vector of n independent standard 
normal random variables. Let f : W 1 — * K denote a Lipschitz function with Lipschitz constant L (with 
respect to the Euclidean distance). Then for all t > 0, 

P{f(X) - Ef(X) >t}< e -* 2 /(2L 2 ) . 

Proof of Proposition |2} Simply note that under the null hypothesis, for each S E C, X$ is a zero-mean 
normally distributed random variable with variance K = \S\. Since maxggc X$ is a Lipschitz function of X 
with Lipschitz constant yK, by Tsirelson's inequality, for all t > 0, 

V l max X s > E maxX s + tl < e~ t2/{2K) . 
I sec sec J 

On the other hand, under Pg for a fixed S E C, 

max A> > X s - Af(fj,K, K) 



and therefore 



|maxX 5 < fj,K - tX < e "* 2/(2K) 



which completes the proof. □ 

The maximum test is often easier to compute than the optimal test /*, though maximization is not 
always possible in polynomial time. If the value of Eomaxggc X$ is not exactly known, one may replace it 
in the definition of / by any upper bound and then the same upper bound will appear in the performance 
bound. 

Proposition[2]shows that the maximum test is guaranteed to work whenever fj, is at least E maxg g c Xs / K+ 
const. I \J~K. Thus, in order to better understand the behavior of the maximum test (and thus obtain sufficient 
conditions for the optimal test to have a low risk), one needs to understand the expected value of maxsge ^Cs 
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(under P ). As the maximum of Gaussian processes have been studied extensively, there are plenty of directly 
applicable results available for expected maxima. The textbook of Talagrand [21] is dedicated to this topic. 
Here we only recall some of the basic facts. 

First note that one always has E maxjgc Xg < \J2K log N but sharper bounds can be derived by 
chaining arguments, see Talagrand's [55] for an elegant and advanced treatment. The classical chaining 
bound of Dudley |13] works as follows. Introduce a metric on C by 

d(S,T) = ^MXs-X T Y = y/d H (S,T) , S,TeC 

where djj(S,T) — X^Li l{i{ (eS }^i« 6T }.} denotes the Hamming distance. For t > 0, let N(t) denote the 
f-covering number of C with respect to the metric d, that is, the smallest number of open balls of radius t 
that cover C. By Dudley's theorem, there exists a numerical constant C such that 

[>diam(C) 

E maxXs <C y/log N(t)dt 

SeC Jo 

where diam(C) — maxs,TeC d(S, T) denotes the diameter of the metric space C. Note that since \S\ = K 
for all S € C, diam(C) < \J2K. Dudley's theorem is not optimal but it is relatively easy to use. Dudley's 
theorem has been refined, based on "majorizing measures" , or "generic chaining" which gives sharp bounds, 
see, for example, Talagrand [25] , 

Remark, (the vc dimension.) In certain cases it is convenient to further bound Dudley's inequality in 
terms of the vc dimension [30]. Recall that the vc dimension V(C) of C is the largest positive integer m such 
that there exists an m-element set . . . , i m } C {1, . . . ,n} such that for all 2 m subsets A C i m } 
there exists an S S C such that S (1 . . . , i m } = A. Haussler [T5] proved that the covering numbers of C 
may be bounded as 

N(t)<e-(V(C) + l) f^J 

so by Dudley's bound, 

E maxX 5 < Cy/V{C)K\ogn . 



3 Lower bounds 

In this section we investigate conditions under which the risk of any test is large. We start with a simple 
universal bound that implies that regardless of what the class C is, small risk cannot be achieved unless is 
substantially large compared to K^ 1 / 2 . 



A universal lower bound 

An often convenient way of bounding the Bayes risk R* is in terms of the Bhattacharyya measure of affinity 
(Bhattacharyya [7]) 

P = Pc(M)= ^Eov^W • 
It is well known (see, e.g., [TI] Theorem 3.1]) that 

1 - y/l - 4p 2 <R*<2p. 

Thus, 2p essentially behaves as the Bayes error in the sense that R* is near 1 when 2p is near 1, and is small 
when 2p is small. Observe that, by Jensen's inequality, 



2p = E y/L(X) = f JjjY, Mx)M*)dx > ^ J \/4s(x)Mx)dx . 



sec sec • 
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Straightforward calculation shows that for any S € C, 
and therefore we have the following. 

Proposition 4 For all classes C, R* > 1/2 whenever /i < (4/if) log(4/3). 

This shows that no matter what the class C is, detection is hopeless if [i is of the order of K~ 1 / 2 . This 
classical fact goes back to Le Cam [22] . 

A lower bound based on overlapping pairs 

The next lemma is due to Arias-Castro, Candes, Helgason, and Zeitouni [3J. For completeness we recall their 
proof. 

Proposition 5 Let S and S' be drawn independently, uniformly, at random from C and let Z = \S D S'\. 
Then 

R*>1- -\/Ee^ z - 1 . 
2 

Proof: As noted above, by the Cauchy-Schwarz inequality, 

R* = 1 - ^EolL(X) - 1| > 1 - ^E \L(X)-l\z 

Since E L(X) = 1, 

E \L(X) - 1| 2 = Var (L(X)) = E [L(X) 2 ] - 1 . 
However, by definition L(X) = X^sec e^ Xs ~ Klj2 1 2 , so we have 



E [L(X) 2 ] = ^ ^ e-^ 2 E e^ fA ' 
s,s"ec 



But 



Eoe M(X S +X s ,) = ^ 



= (E e^) 2(K - |SnS ' l) (E e 2 ^) |SnS ' 1 
- „M 2 (^-|SnS'|)+2 AI 2 |sns'| 

and the statement follows. □ 

The beauty of this lemma is that it reduces the problem to studying a purely combinatorial quantity. By 
deriving upper bounds for the moment generating function of the overlap | S n S'\ between two elements of 
C drawn independently and uniformly at random, one obtains lower bounds for the critical value of fi. This 
simple lemma turns out to be surprisingly powerful as it will be illustrated in various applications below. 

A lower bound for symmetric classes 

We begin by deriving some simple consequences of Lemma [5] under some general symmetry conditions on 
the class C. The following proposition shows that the universal bound of Proposition [4] can be improved by 
a factor of ^/log(l + n/K) for all sufficiently symmetric classes. 



G 



Proposition 6 Let 5 G (0,1). Assume that C satisfies the following conditions of symmetry. Let S,S' 
be drawn independently and uniformly at random from C. Assume that (i) the conditional distribution of 
Z = \S R S'\ given S' is identical for all values of S 1 ; (ii) for any fixed S 1 € C and i 6 S", ¥{i eS} = K/n. 
Then R* > 5 for all [i with 



/ 1 , / 4n(l - <5) 2 
M< W^log 1 + 



K 



K 



Proof: We apply Proposition [5] By the first symmetry assumption it suffices to derive a suitable upper 
bound for E[e^ z ] = E[e^ Z \S'] for an arbitrary S' € C. After a possible relabeling, we may assume that 



S' = {1, . . . , K} so we can write Z = J^iLi ^{ieS}- By Holder's inequality. 



E[e^ z ] = E 



A' 



IK 1 



:i=l 



< n( E 

i=l 



1/K 



E 



^ 2 K _ 1 



(by assumption (ii)) 



K 



Proposition [5] now implies the statement. 



□ 



Surprisingly, the lower bound of Proposition[6]is close to optimal in many cases. This is true, in particular 
when the class C is "small," made precise in the following statement. 

Corollary 7 Assume that C is symmetric in the sense of Proposition and that it contains at most n a 
elements where a > 0. Then R* > 1/2 for all /1 with 



/' \/-^log (l 



K 



K 



and R* < 1/2 for all /1 with 



2n 

H> \l — logn . 



Proof: The first statement follows from Proposition [6] while the second from Proposition [2] and the fact that 
E max sec X s < y/2Klog\C\. □ 



The proposition above shows that for any small and sufficiently symmetric class, the critical value of /1 
is of the order of yj (log n)/K, at least if K < n 13 for some j3 € (0, 1). Later we will see examples of "large" 
classes for which Proposition [6] also gives a bound of the correct order of magnitude. 

Negative association 

The bound of Proposition [6] may be improved significantly under an additional condition of negative as- 
sociation that is satisfied in several interesting examples (see Section [4] below) . Recall that a collection 
Y\, . . . , Y n of random variables is negatively associated if for any pair of disjoint sets I, J C {1, . . . , n} and 
(coordinate-wise) non-decreasing functions / and <?, 

V[f(Yui e l)g(Y ,j e J)] < E[f(Y i} i e J)]E[ff(l},j e J)} . 
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Proposition 8 Let S 6 (0,1) and assume that the class C satisfies the conditions of Proposition^ Suppose 
that the labels are such that S' = {1,2,..., K} € C. Let S be a randomly chosen element of C. Lf the random 
variables l{i e s}, . . . , l/^gg} are negatively associated then R* > 5 for all /i with 



M < l/log 1 



nlog(l+4(l-£) 2 ) 
K 2 



Proof: We proceed similarly to the proof of Proposition [6] We have 



E[e^ z ] = E 



K 



< 



,i=i 

K 



(by negative association) 



K 



Proposition [5] and the upper bound above imply that R* at least 5 for all jj, such that 



< 



\ 



log 



^ n ((1+4(1 -5) 2 y/ K - 1)\ 



A' 



The result follows by using e y > 1 + y with y = K 1 log(l + 4(1 - cT) 2 ). 



□ 



4 Examples 

In this section we consider various concrete examples and work out upper and lower bounds for the critical 
range of /i. 



4.1 Disjoint sets 

We start with the simplest possible case, that is, when all S £ C are disjoint (and therefore KN < n). 
Fix 8 £ (0,1). Then, under Pq, the X$ are independent normal (0,K) random variables and the bound 
Eomax^gc Xs < \/^K log TV is close to being tight. By applying the maximum test /, we see that R* < 
R{f) _: S whenever 



K V K 

To see that this bound gives the correct order of magnitude, we may simply apply Proposition [5] Here Z 
may take two values: 

Z = K with probability 1/N, and Z = with probability 1 - 1/N. 

Thus, 

Ee^ z - 1 = - (e^ K -l)< -e^ K 
N \ ) ~ N 

and therefore R* > S whenever 



^ _ /log(2JV(l - S) 2 ) 



K 



8 



So in this case the critical transition occurs when fi is of the order of y/(l/K) logJV. In Section [e] we use 
this simple lower bound to establish lower bounds for general classes C of sets. Note that in this simple case 
one may directly analyze the risk of the optimal test and obtain sharper bounds. In particular, the leading 
constant in the lower bound is suboptimal. However, in this paper our aim is to understand some general 
phenomena and we focus on orders of magnitude rather than on nailing down sharp constants. 



4.2 K-sets 

Consider the example when C contains all sets S C {1, . . . , n} of size K. Thus, N = As mentioned in 
the introduction, this problem is very well understood as sharp bounds and sophisticated tests are available, 
see, for example, Ingster [20], Baraud (5], Donoho and Jin [12]. We include it for illustration purposes only 
and we warn the reader that the obtained bounds are not sharpest possible. 

Let 6 € (0, 1). It is easy to see that the assumptions of Proposition [8] are satisfied and therefore R* > 1 — 5 
for all 

[ 7 nlog(l + 4(l-^) 

This simple bound turns out to have the correct order of magnitude both when n 3> K 2 (in which case it is 
of the order of \/log (n/K 2 )) and when n <C K 2 (when it is of the order of y^n/K 2 ) . 

This may be seen by considering the two simple tests described in Section [2] in the two different regimes. 
Since 



Ml) 



K ~ K ~ 

we see from Proposition 2 that when K — O (n^ 1-6 ^ 2 ) for some fixed e > 0, then the threshold value is of 
the order of ^/\ogn. On the other hand, when K 2 jn is bounded away from zero, then the lower bound above 
is of the order yj n/K 2 and the averaging test provides a matching upper bound by Proposition [l] 

Note that in this example the maximum test is easy to compute since it suffices to find the K largest 
values among X\ , . . . , X n . 



4.3 Perfect matchings 

Let C be the set of all perfect matchings of the complete bipartite graph K m rn . Thus, we have n — m 2 
edges and N — to!, and K = to. By Proposition [I] (i.e., the averaging test), for 6 S (0, 1), one has R(f) < 5 
whenever /i > y/B log(2/<5). 

To show that this bound has the right order of magnitude, we may apply Proposition [8] The symmetry 
assumptions hold obviously and the negative association property follows from the fact that Z = \S n S'\ 
has the same distribution as the number of fixed points in a random permutation. The proposition implies 
that for all to, R* > S whenever 

H< v/log(l+log(l+ 4(l-5) 2 )) . 

Note that in this case the optimal test /* can be approximated in a computationally efficient way. To this 
end, observe that computing 

sec ' a j=l 

(where the summation is over all permutations of {1, . . . ,to}) is equivalent to computing the permanent of 
an to x to matrix with non-negative elements. By a deep result of Jerrum, Sinclair, and Vigoda |21j . this 
may be done by a polynomial-time randomized approximation. 



9 



4.4 Stars 



Consider a network of m nodes in which each pair of nodes interacts. One wishes to test if there is a 
corrupted node in the network whose interactions slightly differ from the rest. This situation may be 
modeled by considering the class of stars. 

A star is a subgraph of the complete graph K m which contains all K — m — 1 edges containing a fixed 
vertex (see Figure 111. Consider the set C of all stars. In this setting, n — (™) and N = m. 




Figure 1: A star [5T| . 

In this case Corollary [7] is applicable and we obtain that if C is the class of all stars in K m then for any 

"»-»<» i if M < (i-e^/M" 



4.5 Spanning trees 



Consider again a network of m nodes in which each pair of nodes interact. One may wish to test if there 
exists a corrupted connected subgraph containing each node. This leads us to considering the class of all 
spanning trees as follows. 

Let 1, 2, . . . , n = (™) represent the edges of the complete graph K m and let C be the set of all spanning 
trees of K m . Thus, we have N — m m ~ 2 spanning trees and K = m — 1 (See, e.g., [55].) By Proposition [T] 
the averaging test has risk R(f) < S whenever /i > ^/41og(2/<5). 

This bound is indeed of the right order. To see this, we may start with Proposition [5j There are (at least) 
two ways of proceeding. One is based on negative association. Even though Proposition [8] is not applicable 
because of the lack of symmetry in C, negative association still holds. In particular, by a result of Feder 
and Mihail |14j (see also Grimmett and Winkler |17j and Benjamini, Lyons, Peres, and Schramm 6 ), if S 
is a random uniform spanning tree of K m , then the indicators . . . , l{ ng 5} are negatively associated. 

This means that, if 5* and 5" are independent uniform spanning trees and Z = \S D S'\, 



E 



^ 2 z 



< 



EE 



e t* 2 \sns'\\ s i 



= EE ^ 2 ^s-l(ies}|5' 

< E Yl E [e' l2l {* es > |S" (by negative association) 



ieS' 



E n 

ieS' 



— +1 
m 



< exp^e^ 2 ) 



10 



This, together with Proposition [5] shows that for any S € (0, 1), R* > 6 whenever 



M < Wlogf l+~log(l+4(l-5) 2 ) 



We note here that the same bound can be proved by a completely different way that does not use negative 
association. The key is to note that we may generate the two random spanning trees based on 2(m — 1) 
independent random variables Xi, . . . , X 2 ( m -i) taking values in {1, . . . , m— 1} as in Aldous [Tj (See also [9]). 
The key property we need is that if Zi denotes the number of common edges in the two spanning trees when 
Xi is replaced by an independent copy X[ while keeping all other Xj's fixed, then 



(the details are omitted). For random variables satisfying this last property, an inequality of Boucheron, 
Lugosi, and Massart [8] implies the sub-Poissonian bound 



Clearly, EZ — 2(n — l)/n < 2, so essentially the same bound as above is obtained. 

As the bounds above show, the computationally trivial average test has a close-to-optimal performance. 
In spite of this, one may wish to use the optimal test /*. The "partition function" (1/-/V) J^sec e^ Xs may 
be computed by an algorithm of Propp and Wilson [25, who introduced a random sampling algorithm 
that, given a graph with non-negative weights Wi over the edges, samples a random spanning tree from a 
distribution such that the probability of any spanning tree S is proportional to nigs w *' ^he ex P ec t e d 
running time of the algorithm is bounded by the cover time of an associated Markov chain that is defined as 
a random walk over the graph in which the transition probabilities are proportional to the edge weights. If 
\i is of the order of a constant (as in the critical range) then the cover time is easily shown to be polynomial 
(with high probability) as all edge weights Wi = Xi are roughly of the same order both under the null 
and under the alternative hypotheses. 

4.6 Cliques 

Another natural application is the class of all cliques of a certain size in a complete graph. More precisely, 
the random variables X\ , . . . , X n are associated with the edges of the complete graph K m such that (™) = n 
and let C contain all cliques of size k. Thus, K = Q) and N — (™). This case is more difficult than the 
class of K-sets discussed above because negative association does not hold anymore. Also, computationally 
the class of cliques is much more complex. A related, well-studied model starts with the subgraph K m 
containing each edge independently with probability 1/2, as null hypothesis. The alternative hypothesis is 
the same as the null hypothesis, except that there is a clique of size k on which each edge is independently 
present with probability p > 1/2. This is called the "hidden clique" problem (usually only the special case 
p = 1 is considered). Despite substantial interest in the hidden clique problem, polynomial time detection 
algorithms are only known when k = fl(^/n) [21 [15]. We may obtain the hidden clique model from our model 
by thresholding at weight zero (retaining only edges whose normal random variable is positive), and so our 
model is easier for testing than the hidden clique model. However, it seems likely that designing an efficient 
test in the normal setting will be as difficult as it has proved for hidden cliques. It would be of interest to 
construct near-optimal tests that are computable in polynomial time for larger values of k. 

We have the following bounds for the performance of the optimal test. It shows that when k is a most 
of the order of ^/m, the critical value of fj, is of the order of \J (1/fc) log(m/fc). The proof below may be 
adjusted to handle larger values of k as well but we prefer to keep the calculations more transparent. 



2(ro-l) 



{Z-Z,) + <Z 



i=l 
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Proposition 9 Let C represent the class of all N — (v) cliques of a complete graph K m and assume that 

k < \J m(log 2) /e. Then 

(i) for all 8 S (0, 1), R* < 5 whenever 



V > 2 



(ii) R* > 1/2 whenever 



L i log (x) +4 ^ 
M -^ log (i) 



'log(2/<5) 
fc(fc-l) ' 



Proof: (i) follows simply by a straightforward application of Proposition [2] and the bound Eo maxsec Xs < 
y/2K\ogN. 

To prove the lower bound (ii), by Proposition [5] it suffices to show that if S,S' are fc-cliques drawn 
randomly and independently from C and Z denotes the number of edges in the intersection of S and S' , then 
E [exp(/i 2 Z)] < 2 for the indicated values of fj,. 

Because of symmetry E [exp(/z 2 Z)] = E [exp(/i 2 Z)|S"] for all S' and therefore we might as well fix an 
arbitrary clique S' . If Y denotes the number of vertices in the clique S f~l 5" then Z — (XY Moreover, the 
distribution of Y is hypergeometrical with parameters m and k. If B is a binomial random variable with 
parameters k and k/m, then since exp(fj, 2 x 2 /2) is a convex function of x, an inequality of Hocffding [TH] 
implies that 



E 



= E 



< E 



Thus, it remains to derive an appropriate upper bound for the moment generating function of the squared 
binomial. To this end, let c > 1 be a parameter whose value will be specified later. Using 

/;•'• ii{ ki 



and the Cauchy-Schwarz inequality, it suffices to show that 





( 2 k 2 Y 






E 


cxp fj, c — B 


•E 


exp ( 




V m J. 







L {B>c^} 



< 4 



(2) 



We show that, if /i satisfies the condition of (ii), for an appropriate choice of c, both terms on the left-hand 
side are at most 2. 



The first term on the left-hand side of ^ is 

k 2 



E 



exp /j, c — B 
m 



= 1 



k 
m 



exp \x c 



- 1 



which is at most 2 if and only if 



k 

m 



exp /j, c 



- - 1 < 2 1 ' k - 1 



Since 2 1 / fc - 1 > (log2)/fc, this is implied by 



< 



ck 2 



log 1 



to log 2 
k 2 



To bound the second term on the left-hand side of ^ , note that 



E 



cxp ( ii 2 kBl 



{B>c^} 



< 1 +E 

< 1 - 



1 {B>c ^ } ex P(M 2fc - B ) 



k 2 

B > c— 
m 



1/2 



(E [exp (/z 2 kB)]) 



1/2 
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by the Cauchy-Schwarz inequality, so it suffices to show that 

■„2' 



k 2 1 

B > c — > • E [exp (u 2 kB)] < 1 . 
to J 



Denoting ft.(x) = (1 + x) log(l + x) — x, Chernoff's bound implies 

P js > c^-j < exp f-^-/i(c- 1; j . 

On the other hand, 



E [exp (fi 2 kB)] = (l + ^ exp (m 2 ^ , 



and therefore the second term on the left-hand side of (|2| is at most 2 whenever 

k ( k 

1 H exp (^ 2 fc) < exp ( — ft/c - 1) 

to \m 

Using exp (~ft(c —1))>1 + — fa(c — 1), we obtain the sufficient condition 



M < yilog/i(c- l) . 

Summarizing, we have shown that i?* > 1/2 for all /i satisfying 



, , 1 , . , / m . ( to log 2 
M<2-mm \ -log^c-l) ,W — log 1 1 



A; ' V ck 2 \ k 2 



Choosing 

to log(m/fc) 



C : 



k log(TO,log 2/fc 2 ) 



(which is greater than 1 for k < y // m(log2) / e) , the second term on the right-hand side is at most y/(l/k) log(m/k). 
Now observe that since h(c — 1) = clog c — c + 1 is convex, for any a > 0, h(c— 1) > clog a — a + 1. Choosing 



log(m / k) 
log(m log 2/k 2 ) 5 



the first term is at least 



/l /to log(m/fc) \ /l f TON \ 
y k ° g \J~ log(mlog2//c 2 ) )-\jk° g \2k) 

where we used the condition that to log 2/fc 2 > e and that x > 21ogx for all x > 0. □ 



Remark, (a related problem.) A closely related problem arising in the exploratory analysis of microar- 
ray data (see Shabalin, Weigman, Perou, and Nobel is when each member of C represents the K edges 
of a \[K x \J~K biclique of the complete bipartite graph K m ^ m where m = ^/n. (A biclique is a complete 
bipartite subgraph of K mtm .) The analysis and the bounds are completely analogous to the one worked out 
above, the details are omitted. 
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5 On the monotonicity of the risk 



Intuitively, one would expect that the testing problem becomes harder as the class C gets larger. More 
precisely, one may expect that if A C C are two classes of subsets of {1, . . . , n}, then R^(n) < Rcit 1 ) holds 
for all /i. The purpose of this section is to show that this intuition is wrong in quite a strong sense as not 
only such general monotonicity property does not hold for the risk, but there are classes A C C for which 
R\{n) is arbitrary close to 1 and R^(fi) is arbitrary close to for the same value of /i. 

However, monotonicity does hold if the class C is sufficiently symmetric. Call a class C symmetric if for 

the optimal test f£(x) = 1{(i/jv) £ sec exp^x;. s x i )>ex P (K^/2)}' thc valuc of p T{.fc( x ) = °} is thc samc for 
all T e C. 

Theorem 10 Let C be a symmetric class of subsets of {1, ... ,n}. If A is an arbitrary subclass ofC, then 
for all /i > 0, R A (fi) < R^(fx). 

Proof: In this proof we fix the value of \l > and suppress it in the notation. Recall the definition of the 
alternative risk measure 

Rc(f) = Po{/W = 1} + maxP 5 {/(X) = 0} 
which is to be contrasted with our main risk measure 

Mf) = Mf(x) = i} + ^ E r s{f(x) = 0} . 

sec 

The risk R is obviously monotone in the sense that if A C C then for every /, Ra{I) < Rc(f)- Let f* c and 
f£ denote the optimal tests with respect to both measures of risk. 

First observe that if C is symmetric, then R c {f^) = i?c(/c)- But since Rc(f) < Rc(f) f° r every /, we 
have 

RcCfc) < Rc(fc) = Rc(fc) < RcCfc) < Mil) ■ 

— * 

This means that all inequalities are equalities and, in particular, f c = f£. 
Now if A is an arbitrary subclass of C, then 

R c - Rc(fc) = RdTc) > Ra(T c ) > Ra(T c ) > RaUa) = R*a , 

which completes the proof. □ 



Theorem 11 For every e G (0, 1) there exist n, \x, and classes A C C C {1, . . . , n} such that R^i^) > 1 ~ e 
and R* c {n) < 2e. 

Proof: We work with L\ distances. For any class £, denote <j>c{x) = jj ^2sec 'Ps^)- Recall that 

R* c { l £) = l-\ J \M*)-4>c{x)\dx . 
Given e, we fix an integer K — K(e) large enough that K + 1 > 1/e and that 



log(2(ir +!)(!- 6)2) /8_,/2 
K + l -\jK & \e 

and let n = n(e) = (K + l) 2 . We let A consist of M disjoint subsets of {1, ... , n}, each of size K+l. We 
let B consist of all sets of thc form {1, . . . ,K,i}, where i ranges from K + 1 to n, and assume A has been 
chosen so that A n B = 0. We then let C = A U B. We take 



log(2(Jr+l)(l-e) 2 ) 
K+l 
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so that, as seen in Section 4.1 we have R^(n) > 1 — e. We will require an upper bound on R^(fi), which we 
obtain by considering the averaging test on variables 1, . . . , K, 



/(x) - 1 {j2^ =1 x l >nf 



Just as in Proposition 



we have R(f) < e whenever /j, > < / -jt log (-), which is indeed the case by our 



choices of /i and K. It follows that i?g(/i) < e. We remark that 

J |^-^|=2-2^(/i)<2e. 
We let M = |S| = (if + l) 2 - if; then TV = \C\ = M + K + 1 = (K + l) 2 + 1, and note 



10 ~ 0c 



(K + 1)</> A + Mfa 



N 



(jr + i)(0-&O + Jif(^-0B) 



> 



iV 



> (l-e) 



2e 2 



= (l-e)(2-2^( M ))-2e 2 
> 2 - 4e. 



Thus, i?2(/i) < 2e. 



□ 



Observe that non-monotonicity of the Bhattacharyya affinity also follows from the same argument. To 
this end, we may express pc(n) — \ f y <f>o(x)(f>s(x)dx in function of the Hellinger distance 



as pc(x) = \ — \H{cf)Q,(f)c) 2 ■ Recalling (see, e.g., Devroye and Gyorfi [THJ p. 225]) that 

H{^Ac) 2 < J \Mx)-Mx)\ <2ii(0o,0c) , 

we see that the same example as in the proof above, for n large enough, shows the non-monotonicity of the 
Bhattacharyya affinity as well. 



6 Lower bounds on based random subclasses and metric entropy 

In this section we derive lower bounds for the Bayes risk R* = R^{p). The bounds are in terms of some 
geometric features of the class C. Again, we treat C as a metric space equipped with the canonical distance 
d{S,T) = ^E (Xs — Xt) 2 (i.e., the square root of the Hamming distance du(S,T)). 

For an integer M < N we define a real-valued parameter tc(M) > of the class C as follows. Let A C C 
be obtained by choosing M elements of C at random, without replacement. Let the random variable r denote 
the smallest distance between elements of A and let tc(M) be a median of r. 
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Theorem 12 Let M < N be an integer. Then for any class C, 

R*c > 1/4 

whenever 

jjL < min 



/log(M/16) 81og(V3/8) 



K ' yjK-t c (M) 2 /2) ' 
To interpret the statement of the theorem, note that 

K -t 2 /2= max ISnTl 

S,T£A 

is the largest overlap between any pair of elements of A. Thus, just like in Theorem [5] the distribution of 
the overlap between random elements of C plays a key role in establishing lower bounds for the optimal risk. 
However, while in Theorem [5] the moment generating function Eexp(/i 2 |5 n T\) of the overlap between two 
random elements determines an upper bound for the critical value of fi, here it is the median of the largest 
overlap between many random elements that counts. The latter seems to carry more information about the 
fine geometry of the class. In fact, invoking a simple union bound, upper bounds for Eexp(fj?\S T\) may 



be used together with Theorem 12 



In applications often it suffices to consider the following special case. 
Corollary 13 Let M < N be the largest integer for which zero is a median o/max s t ^a \S D T\ where A is 
a random subset of C of size M (i.e., t c {M) 2 = 2K). Then i?£(/i) > 1/4 for all fx < y/log(M / 16) / K . 



Example, (sub-squares of a grid.) To illustrate the corollary, consider the following example which is 
the simplest in a family of problems investigated by Arias-Castro, Candes, and Durand jlj: assume that n 
and K are both perfect squares and that the indices {1, . . . , n} are arranged in a y/n x grid. The class 
C contains all \[K x \[K sub-squares. Now if S and T are randomly chosen elements of C (with or without 
replacement) then, if (K + l) 2 < 2y/n, 



P{|5nT| ^ 0} > 



K 



(y/n - K+ l) 2 ' (y/n - K + l) 2 



> 



K 



and therefore 



max |snr| = (o = i-: 

S.TeA 



max \Sr\T\ > 

S,T£A 



> 1 — M 



,K 



which is at least 1/2 if M < y/n/(2K) in which case t c (M) 2 = 2K. Thus, by Corollary 13 R£(n) > 1/4 for 



all n < y/log(n/(5l2K))/(2K). This bound is of the optimal order of magnitude as it is easily seen by an 
application of Proposition [2] 

In some other applications a better bound is obtained if some overlap is allowed. A case in point is the 



example of stars from Section 4.4 In that case any two elements of C overlap but by taking M = N(= m), 
we have K - t c (M) 2 /2 = 1, so Theorem [l2| still implies i?cO) > 1/4 whenever /x < y/(l/K) log(m/16). 



The main tool of the proof of Theorem 12 is Slepian's lemma which we recall here [27] . (For this version, 
see Ledoux and Talagrand |23J Theorem 3.11].) 



Lemma 14 (slepian's lemma.) Let£ 
such that for each i,j = l,...,N, 



..,N, 

E^ 2 =ECi for each i = 1, . . 



• ! Cjv), C = (Ci ■ • • j Cn) € be zero-mean Gaussian vectors 



, N and E&£j < EQQ for all i ^ j. 
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Let F 



be such that for all x G K and i =/= j 



d 2 F 

dxidxj 



(x) < 



Then EF(g) > EF(Q. 



Proof of Theorem |12| Let M < N be fixed and choose M sets from C uniformly at random (without 
replacement). Let A denote the random subclass of C obtained this way. Denote the likelihood ratio 
associated to this class by 

M 



L A {X) 



SeA 



MX) 

where Vs — ^ lX s- K i^ / 2 _ xhcn the optimal risk of the class C may be lower bounded by 

- R* A (n) = i (E \L A (X) - 1| - E |L C (X) - 1|) > -~E \L A (X) - L C (X)\ 

Denoting by E expectation with respect to the random choice of A, we have 

1 



R* e (fx) > ER* A (ii) 



-E E 



Ly Vs _±y Vs 



SEA 



sec 



> ERaM- 1 2X ^(I i Y, V s-^Y< V s) 
\ \ seA sec ) 



> ER* A (fj,) ~ 



\ 



E 



1 1 

M ' N 



Tec \ sec / 



(since the variance of a sample without replacement is less than that with replacement) 



An easy way to bound the right-hand side is by writing 

2 

/ 1 ^ "\ 

En 



lo(v T -j;J2 V s) ^ 2E ° ( Vt - + 2E ° f 1 - Tj E v s) 
V sec J \ sec J 



< 2E (F T -l) 2 + -^E (l~ys) 



sec 



4Var(t/ T ) = 4 {e^ K - 1 



Summarizing, we have 



R* e (fx) > ER* A (fi) 



2 K-l ^ 1 

- > ER* A (n) 



where we used the assumption that /j, < y/(l/K) log(M/16). Thus, it suffices to prove that ER A (/i) > 1/2. 
We bound the optimal risk associated with A in terms of the Bhattacharyya affinity 



P-a(m) = 2 E ° 



K (V M )EseAMX) 1 



k>(X) 



= 2 E " 



seA 
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Recalling from Section [5] that R\(p) > 1 — y/l — 4p^(/i) 2 and using that V 1 — 4a; 2 is concave, we have 



Ei?^(M)>l-yi-4^(^' 2 

Therefore, it suffices to show that the expected Bhattacharyya affinity Ep^(/i) corresponding to the random 
class A satisfies 



Ep A (p) = ^EEo l± ^V s > 



In the argument below we fix the random class A, relabel the elements so that A = {1,2, ... , \A\}, and 
bound pa{^) from below. Denote the minimum distance between any two elements of A by r. To bound 
Pa{h), we a Pply Slepian's lemma with the function 



\A\ 

f(x) = x y 6^,-^/2 



where x = {x\, . . . ,X\m). Simple calculation shows that the mixed second partial derivatives of F are 
negative, so Slepian's lemma is indeed applicable. 

Next we introduce the random vectors £ and Let the components of £ be indexed by elements 
S 6 A and define £s = X$ = J2ies Thus, under Po, each £5 is normal (0,K) and EF(£) is just the 
Bhattacharyya affinity p A {p). To define the random vector £, introduce N + 1 independent standard normal 
random variables: one variable Gs for each S £ A and an extra variable Go- Recall that the definition of r 
guarantees that the minimal distance between any two elements of A as at least r. Now let 



Then clearly for each S,T £ A, E(| = K and E( s (t = K - t 2 /2 {S ^ T). On the other hand, Ef§ = K 
and 

E£sfr = \SnT\=K- y ' 2 1 < K - — = ECsCt • 
Therefore, by Slepian's lemma, Pa(h) = E-F(£) > Ei* 1 ^). However, 



mo = eJ^E^^ 2 



E L^K-t*/2G -(K-t1/2)^/2^_ etlT G S /V2-T^^/4 
Ee p,y/K^y2G /2-(K-T 2 /2)^ 2 /i E /J_ eAl rGs/\/2-rV 2 /4 

V '- 41 ^ 



e 



V(if-T 2 /2)/8 E /J_ eM TG s /^TV 2 /4 



To finish the proof, it suffices to observe that the last expression is the Bhattacharyya affinity corresponding 
to a class of disjoint sets, all of size r 2 /2, of cardinality |.4| — M . This case has been handled in the first 
example of Section [4] where we showed that 



E. 



\A\^ A ~ A ~ 2VM 
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where again we used the condition /i < ^/log(M/16) JK and the fact that r 2 /2 < K. 
Therefore, under this condition on [i, we have that for any fixed A, 

pM = V(C) > 3 e -, 2 (^-v 2 )/8 

and therefore 

where tc(M) is the median of r. This concludes the proof. □ 



Remark, (an improvement.) At the price of losing a constant factor in the statement of Theorem 12 
one may replace the parameter tc (M ) by a larger quantity. The idea is that by thinning the random subclass 
A one may consider a subset of A that has better separation properties. More precisely, for an even integer 
M < TV we may define a real- valued parameter tc(M) > of the class C as follows. Let A C C be obtained 
by choosing M elements of C at random, without replacement. Order the elements Si, ... , Sm of A such 
that 



mm d(Si, Si) > mm d(S 2 , Si) > 

i^l 



> mmd(S M ,Si) 



and define the subset A C A by A — {Ai, . . . , A M / 2 }- Let the random variable r denote the smallest distance 
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between elements of A and let tc(M) be the median of r. It is easy to see that the proof of Theorem 
goes through, and one may replace tc (M) by tc (M) (by adjusting the constants appropriately) . One simply 
needs to observe that since each Vg is non-negative, 



J2V S > '-Eo 
seA \ 



1 

\A\ 



1 



Pa(v) 



If tc(M) is significantly larger than tc(M) that the gain may be substantial. 



If the class C is symmetric then thanks to Theorem 10 the theorem above can be improved and simplified. 
If the class is symmetric, instead of having to work with randomly chosen subclasses, one may optimally 
choose a separated subset. Then the bounds can be expressed in terms of the metric entropy of C, more 
precisely, by its packing numbers with respect to the canonical distance d(S, T) = y / 'Eo(Xs — Xt) 2 - 

We say that A C C is a t-separated set (or t-packing) if for any S, T S A, d(S, T) > t. For t < y2K, define 
the packing number M{t) as the size of a maximal t-separated subset A of C. It is a simple well-known fact 
that packing numbers are closely related to the covering numbers introduced in Section [2] by the inequalities 
N{t) < M{t) < N(t/2). 



Theorem 15 Let C be symmetric in the sense of Theorem 10 and let t < \/2K . Th 



len 



R* c > 1/2 



whe 



\x < min 



/ log(M(t)/16) 81og(V3/2) 

K ' y/K -t 2 /2 



Proof: Let A C C be a maximal t-separated subclass. Since C is symmetric, by Theorem [10] i? c > R* A so 
it suffices to show that R*^ > 1/2 for the indicated values of fi. The rest of the proof is identical to that of 
Theorem [HI □ 
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To interpret this result, take t = y/2K(l - e) for some e E (0, 1/2). Then, by the theorem, R* > 1/2 if 



As an example, suppose that the class C is such that there exists a constant V > such that M(t) ~ (n/t 2 ) v . 
(Recall that all classes with vc dimension V have an upper bound of this form for the packing numbers, 
see Re mark p. |5 |) In thi s case one may choose e ~ V login/ K) and obtain a sufficient condition of the form 
(i > c^(V/K)\og{n/K) (for some constant c), closely matching the bound obtained for the maximum test 
by Dudley's chaining bound. 




7 Optimal versus maximum test: an analysis of the type I error 

In all examples considered above, upper bounds for the optimal risk R* are derived by analyzing either the 
maximum test or the averaging test. As the examples show, very often these simple tests have a near-optimal 
performance. The optimal test /* is generally more difficult to study. In this section we analyze directly the 
performance of the optimal test. More precisely, we derive general upper bounds for the type I error (i.e., 
the probability that the null hypothesis is rejected under Pq) of /*. The upper bound involves the expected 
value of the maximum of a Gaussian process indexed by a sparse subset of C and can be significantly smaller 
than the maximum over the whole class that appears in the performance bound of the maximum test in 
Proposition [2j Unfortunately we do not have an analogous bound for the type II error. 
We consider the type I error of the optimal test /* 



PoiriX) - 1} - ¥ {L(X) > 1} = P \ i £ > e^ 2 / 2 . 

I sec J 

An easy bound is A J2sec e ^ s - e^ maXs e c Xs so 

P {L(X) > 1} < P |maxA 5 > 

Thus, P {L(X) > 1} < S whenever fi > (l/K)E max s X s + y/(2/K) log(l/5). Of course, we already know 
this from Proposition [2] where this bound was derived for the (suboptimal) test based on maxima. 

In order to understand the difference between the performance of the optimal test /* and the maximum 
test, one needs to compare the random variables (l//x) log -k X)sec c^ Xs and maxggc 

Proposition 16 For any S € (0, 1), the type I error of the optimal test f* satisfies 

P {/*(X) = 1} < <5 

whenever 

2 1 32 log(2M) 

where A is any y/K /2-cover of C. 



If A is a minimal \f~K /2-cover of C then (1/K)E max Se ^ X s < J 21 °s jV (y 7? / 2 ) _ g y "Sudakov's mmora- 



K 

tion" (see Ledoux and Talagrand |23| Theorem 3.18]) this upper bound is sharp up to a constant factor. 

It is instructive to compare this bound with that of Proposition [2] for the performance of the maximum 
test. In Proposition 16 we were able to replace the expected maximum E maxjgc by Eq max se.4 



where now the maximum is taken over a potentially much smaller subset A C C. It is not difficult to 
construct examples when there is a substantial difference, even in the order of magnitude, between the two 
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expected maxima so we have a genuine gain over the simple upper bound of Proposition [2] Unfortunately, 
we do not know if an analog upper bound holds for the type II error (1/iV) J2sec ^s{f*(X) = 0} of the 
optimal test /* . 



Proof: Introduce the notation 



M c (fi) = E - log (^E eMXS ) 
^ V sec ) 



Then 



We use Tsirelson's inequality (Lemma [3]) to bound this probability. To this end, we need to show that the 
function h : R N -> ffi defined by 



M \ sec / 



is Lipschitz (where x = (x\, . . . , xn))- Observing that 



8 V JEkc'"™ 



we have 

iivm-)ii 2 -e(£(-)) 2 <E^)-^ 



J'=l J 



and therefore h is indeed Lipschitz \JK. By Tsirelson's inequality, we have 

w^.i< a ,(- w ' a "' wl ' ) . 

Thus, the type I error is bounded by 6 if 

2McGu) /~8~, 1 

It remains to bound Mc(/i). 

Let i < \J1K be a positive integer and consider a minimal i-cover of the set C, that is, a set A G C with 
cardinality |^4| = AT(t) such that, if w(S) denotes an element in A whose distance to 5 € C is minimal then 
d{S,n(S)) < t for all S eC. Then clearly, 



,( M )<E ilog(l^e^ 
' V sec 



To bound the first term on the right-hand side, note that, by Jensen's inequality, 
M \ sec / M V sec / 
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since for each S, cIh{Xs,X v ( S -)) < t 2 and therefore X$ — ^(s) is a centered normal random variable with 
variance dii{Xs, X n (s))- For the second term we have 

E max X s < J2K log N(t) . 
SeA 

Choosing t 2 = A/4, we obtain the proposition. □ 
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